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Abstract 

The syntactic structure of sentences exhibits a striking regularity: de¬ 
pendencies tend to not cross when drawn above the sentence. We in¬ 
vestigate two competing explanations. The traditional hypothesis is that 
this trend arises from an independent principle of syntax that reduces 
crossings practically to zero. An alternative to this view is the hypothesis 
that crossings are a side effect of dependency lengths, i.e. sentences with 
shorter dependency lengths should tend to have fewer crossings. We are 
able to reject the traditional view in the majority of languages consid¬ 
ered. The alternative hypothesis can lead to a more parsimonious theory 
of language. 
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projectivity. 

Nontechnical, jargon-free summary: Syntactic relations between words 
(e.g., the one that links a verb with its subject) exhibit a strong tendency to not 
cross when drawn as arrows above the sentence. Traditionally, this has been as¬ 
sumed to result from an independent principle of syntax that reduces crossings 
practically to zero. An alternative view is that the trend arises naturally from 
the preference in human languages for word orders that keep related words close 
together. Our statistical analysis discards the traditional view in the majority 
of languages considered. The alternative approach can lead to a simpler theory 
of language. 

26 pages, 2 figures and 3 tables. 


1 Introduction 


One of the main goals of complexity science is to provide parsimonious expla¬ 
nations for statistical patterns that are observed in nature [^[^. Here we pay 
attention to a striking regularity of the syntactic structure of sentences that 
was reported in the 1960s: dependencies tend to not cross when drawn above 
the sentence [^|^, as shown in Fig. The absence of crossings is known as 
planarity, a feature that is intimately related with another property of syntactic 
dependency trees: projectivity [^. Projectivity is a particular case of planarity 
where no dependency covers the root. Interestingly, real sentences that are pla¬ 
nar tend to be projective 03 Here we investigate two competing hypotheses 
for the origins of non-crossing dependencies. 

The traditional hypothesis is that the low frequency of dependency crossings 
arises from an independent principle of syntax that reduces crossings practically 
to zero. This view is held by theories of grammar where crossings are not 
allowed 8pT and also by parsing frameworks where non-crossing dependencies 
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are not allowed or subject to hard constraints [^ [T2p6l . It is also shared by 
research on dependency length minimization where annotations with crossings 
are discarded |17| and actual dependency lengths are compared with two kinds 
of baselines where crossings are not allowed or are subject to hard constraints: 
random orderings and optimal dependency lengths [l7p^ . The traditional view 
is convenient for simplicity and computational reasons: efficient algorithms for 
non-crossing dependencies or limited violations are available 23 24 and is 
justified by the low frequency of crossings in real languages [^, 18 . 


An alternative to this view is the hypothesis that crossings are a side ef¬ 


fect of dependency lengths 26 . This hypothesis predicts that dependencies 
should tend to not cross, combining a tendency for shorter dependency lengths 
to have fewer crossings and the fact that dependencies are actually short. This 
challenges the dogma that unconstrained dependency length minimization “does 
not take into account constraints of projectivity or mild context-sensitivity” |^ ; 
and is coherent with the trends towards diachronic reduction of the proportion 
of crossings in conjunction with dependency length minimization that have been 
observed on English and also recently on Latin and Ancient Greek [2^ . 

Here we will evaluate these two hypotheses making emphasis on the validity 
of the traditional view. We will formalize the traditional view as a null hypoth¬ 
esis and the alternative view as an alternative hypothesis. With the help of a 
collection of dependency treebanks of thirty different languages, we will show 
that the null hypothesis of the traditional view is rejected for a large majority 
of treebanks. 


2 Formalization of the problem 

Suppose that C is the number of crossings of a sentence and that n is its num¬ 
ber of words. We define ETB[C\n, D] as the expectation of C conditioning on 
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We keep wondering what Mr. Gates wanted to say 


Figure 1: Two sentences with Stanford annotations from HamleDT 2.0 |28| . Depen¬ 
dencies are labelled with their length (in tokens). For the sentence on top, the sum 
of dependency lengths is D = 23 and the number of crossings is C = 0; D = 24 and 
C = 1 for the sentence at the bottom. 


sentences of a treebank (TB) that have length n and their sum of dependency 
lengths is D. Then the traditional view can be recast simply as 


ETB[C\n,D\ = aTB{n), 


( 1 ) 


where otb is a constant with respect to D that depends on n. For the particular 
case of a complete ban on crossings, aTB{n) = 0 for all n. The Appendix 
provides a derivation of Eq. including a detailed explanation of why qtb 
depends on n in general. Notice that qtb (n) is constant for all trees of length 
n and bear in mind that we will test Eq. [^on sentences of the same length. 

The fact that UTBin) = ETB[C\n] allows one to formulate the traditional 
view equivalently as 

ETB[C\n,D] = ETB[C\n]. ( 2 ) 

Thus, given a treebank and a sentence length n, the traditional hypothesis pre¬ 
dicts that a sentence will have, on average, a number of crossings that coincides 
with the mean number of crossings of the sentences of length n. Accordingly, 
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the alternative view is modeled by 


ETB[C\n,D] = gTsin, D), 


(3) 


where Z3) is a strictly monotonically increasing function of D when n 

remains constant. In this article, we want to remain agnostic about the exact 
mathematical form of gTsin, D). Our focus is on the validity of the traditional 
view. Concerning the alternative view, we are only interested in the sign of 
the correlation between C and D. A positive correlation provides support for 
the hypothesis that crossings are a side effect of dependency lengths. Note 
that a positive correlation between D and C has been shown empirically in 
real syntactic dependency trees, but assuming unrealistic word orders (in par¬ 
ticular, uniformly random linear arrangements) |26] . This correlation has been 
supported using theoretical arguments that show that reducing the length of 
a dependency is likely to imply a reduction in the probability that two edges 
cross, assuming random arrangements that are also unrealistic (25p^ . The limi¬ 
tations of previous research on the hypothesis raise the question of whether such 
a correlation still holds when considering linear arrangements that are actually 
reached. For the first time, here we will investigate the correlation between D 
and C involving their joint distribution in real linear arrangements of syntactic 
dependency trees. Put differently, here we are testing a new condition that is 
vital to evaluate the hypothesis that C is a side effect of dependency lengths, 
and not a consequence of an autonomous principle of syntax that disallows or 
bounds crossings. 

Eq. [^is interesting because it indicates that the traditional view is equivalent 
to C being mean independent of D when n is given, in the language of probability 
theory (M| p. 67]. From the perspective of statistical hypothesis testing, the 
traditional view is a null hypothesis (mean independence), while the alternative 
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view (a positive correlation between D and C) is an alternative hypothesis. 

Although the autonomous bound on crossings has never been explicitly for¬ 
mulated as in Eq. [^orj^ a mathematical definition that can be used for testing 
following standard statistical methods is not forthcoming. In the game of sci¬ 
ence, hypotheses must be precise enough to be falsified [^. One could argue 
that Eq. [2 or are a particular interpretation of an autonomous bound on 
crossings, perhaps a very narrow one. However, is easy to show that a ban on 
crossings, i.e. aTsin) = 0, and Eq. [^with ETB[C\n, D] = 0 are equivalent once 
one focuses on sentences of the same length: 

• If ETB[C\n, O] = 0 then (7 = 0 for any tree of n vertices because C > 0 
by definition. 

• If (7 = 0 for any tree of n vertices, then ETB[C\n, D] = 0 obviously. 

The null hypothesis with axBin) > 0 (Eq. is simply a relaxation of the ban. 

Fig. [^compares the relationship between D and C in sentences of length 18 
in an English dependency treebank. In this case, the traditional view is 


ETB[C\n,D]= aTB{n) (4) 

with OTB(n) = 0.08, the mean number of crossings in sentences of length 18 in 
that treebank. This very low number casts doubts on the adequacy of the null 
hypothesis for the large values of C that are found especially for large values of 
D in Fig. The Kendall r correlation between C and D is t = 0.03 (p-value = 
0.28) indicating a weak but positive tendency of C to increase as D increases. 
In this article, we will study collections of sentences with syntactic dependency 
annotations (treebanks) of different languages, to check if the number of positive 
T correlations across sentence lengths is significantly high. If that happens, we 
will conclude that an autonomous bound on crossings (Eq. does not hold in 
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general for that treebank. 

It is tempting to think that Eq. is impossible to satisfy and thus the 
rejection of the null hypothesis is inevitable. However, notice three facts. First, 
E[C\n,D] = 0 can be satisfied at least for the particular case that the trees are 
star trees: in that case (7 = 0 while (32| 


V? — n mod 2 „ n(n — 1) 

- < D < — - 

4 - - 2 


(5) 


Second, for any given treebank, the null hypothesis is also satisfied by any 
reordering of the words in the sentences that enforces (7 = 0. Concrete ex¬ 
amples come from Hochberg & Stallmann’s algorithm, that provides minimum 
linear arrangements without crossings |23| as well as the random and optimal 
projective linearization algorithms employed in the dependency length research 
reviewed in Section 0(e.g., [T7l[^). 

Third, our analysis will show that the null hypothesis could not be rejected 
in all treebanks (some preliminary evidence is provided by Fig. that shows a 
correlation between D and C that is not statistically significant). 

We would like to emphasize that the goal of this article is not to predict the 
actual number of crossings with great accuracy as in related work 33 but 
to examine the validity of the customary assumption of an autonomous bound 
on crossings with a simple (and statistically sound) approach. Z? is a rough pre¬ 
dictor of crossings because the probability that two dependencies cross is deter¬ 
mined by their individual lengths and whether they share vertices or not |25p6 


D can be seen as a lossy compression of the dependency lengths of a sentence 
into a single value. Furthermore, other factors such as chunking can have an 
important role in the formation of crossings [^. Thus, it is rather surprising 
that the rough predictions that D offers allow us to reject the traditional view 
in the majority of treebanks, as we will see. 
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D 


Figure 2: Crossings (C) versus sum of dependency lengths (D) in sentences of length 
18 in an English treebank (we use Prague dependencies from HamleDT 2.0, see Section 


3.11. 18 is the typical sentence length in this treebank. The average prediction made 


by the null hypothesis is also shown (gray dashed line). 




3 Materials and methods 


3.1 Materials 

We employ HamleDT 2.0, a collection of dependency treebanks of 30 different 
languages . The collection provides sentences with syntactic dependency an¬ 
notations following two different criteria: Prague dependencies and Stanford 
dependencies j^. This collection allows one to explore a set of typologically 
diverse languages and control for the effect of annotation criteria. 

Each syntactic dependency structure in the treebanks was preprocessed by 
removing nodes corresponding to punctuation tokens. To preserve the syntactic 
structure of the rest of the nodes, non-punctuation nodes that had a punctuation 
node as their head were attached as dependents of their nearest non-punctuation 
ancestor. Null elements, which appear in the Bengali, Hindi and Telugu corpora, 
were also subject to the same treatment as punctuation. 

After this preprocessing, syntactic dependency structures that did not define 
a tree were removed. The reason is that we wanted to avoid the statistical 
problem of mixing trees with other kinds of graphs, e.g. the potential number 
of crossings depends on the number of edges 

3.2 Methods 

For each sentence length of a treebank, we want to investigate if the null hy¬ 
pothesis that C is mean independent of D actually holds. This can be tested 
with the help of the Kendall r correlation between C and D. Suppose that ci 
and C 2 are two observations of C and di and d 2 are two observations of D. Then 
(ci, di) and (c 2 , ^ 2 ) are said to be concordant if (ci — C 2 ){di — ^ 2 ) > 0 (the ranks 
of both elements agree), and discordant if (ci — C 2 )(di — ^ 2 ) <0 (they disagree). 


37 38 
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Then, Kendall t correlation is defined as 


_N,-Nd 

No 


( 6 ) 


where N^. is the number of concordant pairs, is the number of discordant 
pairs and Nq is the total number of pairs. 

For each treebank, we calculated the Kendall r correlation between D and 
C for every sentence length n. Sentence lengths that met at least one of the 
following conditions were excluded from the analysis: 


n < 4, because C = 0 for them 37 


• Lengths that were represented by less than two sentences, because Nq = 0 
and then r is not properly defined. 


Then we calculated p(r > 0), the proportion of sentence lengths where t > 0. 
If p{t > 0) is sufficiently high then the null hypothesis of mean independence 
is rejected. The significance of p(r > 0) was determined with the help of a 
Monte Carlo method that takes as input the vectors D" = {d”,..., d",..., d^} 
and C” = {c”,..., c",..., cjjj} of every sentence length n (d” and c" are, re¬ 
spectively, the sum of dependency lengths and number of crossings of the Tth 
sentence of length n) . This method consists of generating T randomizations of 
the input vectors and estimating the p-value of the test as the proportion of 
times that Pc{t > 0) > p{t > 0), where Pc{t > 0) is the value of p(r > 0), 
over T randomizations of the vectors. A randomization consists of replacing the 
vector Z?" for each sentence length with a uniformly random permutation. For 
this article, we use T = 10^ and a significance level of 0.05. 

We could have determined the significance of p(r > 0) by means of a binomial 
test: under the assumption of independence between D and C and assuming 
that there are no ties among values, the probability that r > 0 is 1/2 |40]. 
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However, ties of C abound (many sentences have C = 0, see also Table |^. For 
this reason, the Monte Carlo test above yields a more accurate estimation of 
the true p-value. 

It is convenient to split p(r > 0) as p{t > 0) +p(t = 0) and inspect p(r = 0) 
because Kendall t = 0 is due to Nc = Nd (recall Eq. [^. High p-values of 
p[t > 0) could be due to high p(t = 0), which in turn would be due to C = 0 
for many sentence lengths. To see it, consider the following extreme case: a 
treebank where C = 0 in all sentences. In that case, = Nd = 0 for all 
sentence lengths and then p{t > 0) = p{t = 0). Interestingly, r would remain 
zero for all sentence lengths after randomization and then the p-value of the 
Monte Carlo test would be 1. That has been the case of the Romanian treebank 
with Prague dependencies (Tables and . 


4 Results 

Table shows that p{t > 0) is significantly high in about three fourths of the 
languages for Prague dependencies (eight treebanks have a p-value above the 
significance level) and to a much larger extent for the Stanford dependencies 
(only five treebanks have a p-value above the significance level). Thus, there is 
a minority of languages where there is not enough support for the hypothesis 
that crossing dependencies are a side effect of dependency lengths. Interestingly, 
p{t = 0) is especially high in the treebanks where p{t > 0) is not significantly 
high. A possible explanation for the failure of the alternative view in those 
treebanks is that C = 0 in the majority of sentence lengths. Let us call po 
the proportion of sentence lengths where all sentences have (7 = 0. Table 
indicates that the five treebanks where p(r > 0) is not significantly high for 
Stanford dependencies coincide with the five treebanks with the largest po- The 
situation for Prague dependencies is similar: the top six largest values of po 
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Stanford 


Prague 


I'reebank 

M 

p{t = 0) 

p(t > 0) 

p-value 

M 

p(t = 0) 

P{t > 0) 

p-value 

Arabic 

90 

0.38 

0.34 

0.239 

90 

0.067 

0.7 

0.0001 

Basque 

33 

0.091 

0.85 

< lO-'^ 

33 

0.03 

0.82 

0.0001 

Bengali 

16 

0.19 

0.38 

0.7372 

17 

0.29 

0.53 

0.0871 

Bulgarian 

51 

0.078 

0.69 

0.0006 

52 

0.077 

0.77 

< 10-4 

Catalan 

86 

0.16 

0.76 

< lo-'^ 

86 

0.047 

0.72 

< 10-4 

Czech 

73 

0.027 

0.78 

< 10-^ 

74 

0.054 

0.76 

< 10-4 

Danish 

56 

0.11 

0.62 

0.005 

57 

0.07 

0.77 

< 10-4 

Dutch 

52 

0.019 

0.81 

< 10-^ 

52 

0 

0.85 

< 10-4 

English 

66 

0.11 

0.64 

0.0001 

66 

0.045 

0.64 

0.0089 

Estonian 

22 

0.82 

0.14 

0.3027 

22 

0.45 

0.14 

0.9877 

Finnish 

33 

0.15 

0.79 

< lO-i 

33 

0.091 

0.88 

< 10-4 

German 

72 

0.042 

0.71 

< 10-^ 

72 

0.014 

0.64 

0.0151 

Greek(ancient) 

53 

0 

0.94 

< 10-^ 

53 

0 

0.89 

< 10-4 

Greek(modern) 

63 

0.24 

0.49 

0.0358 

64 

0.14 

0.66 

0.0001 

Hindi 

58 

0.069 

0.78 

< lo-'^ 

58 

0.086 

0.74 

< 10-4 

Hungarian 

62 

0.032 

0.74 

< lo-'^ 

61 

0.049 

0.64 

0.0048 

Italian 

59 

0.34 

0.53 

0.0002 

59 

0.12 

0.73 

< 10-4 

Japanese 

37 

0.97 

0 

1 

37 

0 

0.95 

< 10-4 

Latin 

46 

0 

0.72 

0.0042 

46 

0.043 

0.8 

< 10-4 

Persian 

71 

0.028 

0.25 

0.9999 

72 

0.042 

0.76 

< 10-4 

Portuguese 

79 

0.063 

0.71 

< 10-^ 

79 

0.089 

0.75 

< 10-4 

Romanian 

38 

1 

0 

1 

38 

0.21 

0.5 

0.1266 

Russian 

66 

0.076 

0.76 

< lo-'^ 

65 

0.031 

0.83 

< 10-4 

Slovak 

74 

0.068 

0.64 

0.0008 

76 

0.026 

0.75 

< 10-4 

Slovene 

46 

0.087 

0.59 

0.027 

49 

0.041 

0.73 

0.0001 

Spanish 

81 

0.16 

0.79 

< lo-'^ 

80 

0.037 

0.84 

< 10-4 

Swedish 

59 

0.1 

0.81 

< lO-'^ 

61 

0.033 

0.75 

< 10-4 

Tamil 

31 

0.84 

0.13 

0.2015 

31 

0.68 

0.26 

0.0514 

Telugu 

8 

0.88 

0.12 

0.5315 

8 

0.62 

0.38 

0.1727 

Turkish 

43 

0.07 

0.67 

0.0033 

44 

0.11 

0.75 

< 10-4 


Table 1: Summary of the analysis of the correlation between D and C. For every 
treebank, we show the number of different sentence lengths considered (M), the pro¬ 
portion of sentence lengths where Kendall r is equal or greater than zero (p(r = 0) and 
p{t > 0), respectively), and the p-value of the Monte Carlo test for the significance of 
p\t > 0 ) = p{t > 0 ) - b pir = 0 ). 


are taken by six treebanks where p{t > 0) is not significantly high. In the 
treebanks where pir > 0) is not significantly high we have pir = 0) = po 
practically all cases, although p{t = 0) > po a priori. Indeed, the average po is 
significantly high in the subset of the treebanks where the null hypothesis could 
not be rejected (Table |^. 
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Prague 


Stanford 


Trcebank po Treebank po 


Romanian 

1 

Tamil 

0.68 

Japanese 

0.97 

Telugu 

0.62 

Telugu 

0.88 

Estonian 

0.45 

Tamil 

0.84 

Bengali 

0.29 

Estonian 

0.82 

Romanian 

0.21 

Arabic 

0.34 

Turkish 

0.11 

Italian 

0.32 

Greek(modern) 

0.11 

Greek(modern) 

0.22 

Finnish 

0.091 

Bengali 

0.19 

Hindi 

0.086 

Catalan 

0.16 

Italian 

0.051 

Spanish 

0.14 

Bulgarian 

0.038 

Finnish 

0.12 

Catalan 

0.035 

Danish 

0.11 

Arabic 

0.033 

Swedish 

0.085 

Basque 

0.03 

Turkish 

0.07 

Spanish 

0.025 

Russian 

0.061 

Latin 

0.022 

Bulgarian 

0.059 

Danish 

0.018 

Hindi 

0.052 

Hungarian 

0.016 

English 

0.045 

Russian 

0.015 

Hungarian 

0.032 

English 

0.015 

Basque 

0.03 

Portuguese 

0.013 

Portuguese 

0.025 

Czech 

0 

Slovene 

0.022 

Dutch 

0 

Persian 

0.014 

German 

0 

Czech 

0 

Greek(ancient) 

0 

Dutch 

0 

Japanese 

0 

German 

0 

Persian 

0 

Greek(ancient) 

0 

Slovak 

0 

Latin 

0 

Slovene 

0 

Slovak 

0 

Swedish 

0 


Table 2: po, the proportion of sentence lengths where (7 = 0 for all sentences. Tree- 
banks are sorted decreasingly by pq. The treebanks where the null hypothesis could 
not be rejected according to Table [T] appear in boldface. 


Prague Stanford 



mean 

left p-value 

right p-value 

mean 

left p-value 

right p-value 

PO 

0.8 

1 

10=® 

0.4 

1 

8 X 10“® 

S 

2128.7 

10“® 

1 

1288 

6.5 X 10“® 

1 

M 

37.7 

0.016 

0.98 

23.2 

3.4 X 10“® 

1 

(n> 

11.9 

0.038 

0.96 

8.6 

1.6 X 10-^ 

1 


Table 3: A meta-analysis of the subset of treebanks where p(r > 0) is not significantly 
high with the help of one-sided Fisher randomization tests on the mean of a given 
treebank feature over that subset |39| . Four features are considered: po (the proportion 
of sentence lengths where all sentences are planar), S (the number of sentences), M 
(the number of different sentence lengths) and (n) (the mean length of sentences), 
p-values were estimated with the help of a Monte Carlo procedure over 10® replicas 
and then rounded to leave only two signihcant digits. Means were rounded to leave 
only one decimal. 


13 








5 Discussion 


We have rejected the traditional hypothesis of crossings as being constrained 
independently from the dependency lengths in a large majority of treebanks (47 
out of 60) thanks to a positive correlation between crossings (C) and depen¬ 
dency lengths {D) that holds across sentence lengths. The fact that the number 
of rejections depends on the annotation style (eight treebanks for Prague de¬ 
pendencies, five treebanks for Stanford dependencies) suggests that annotation 
criteria are crucial. Indeed, we have seen that there is a strong tendency for 
(7 = 0 across sentence lengths in those treebanks (Table . 

Before concluding prematurely that the minority of languages where the 
traditional view could not be rejected constitute evidence of an autonomous 
ban of crossings, some words of caution are necessary. First, we should reflect 
on the influence that syntactic dependency annotation criteria have had on the 
results due to: 


• A belief in a 
crossings. 


ban of crossings [U 42 


or a principle of minimization of 


• Automatic conversions from phrase structure grammar to dependency 
treebanks 44 , where crossings could be less likely with respect to 
direct annotations based on dependency grammar. 


Annotation by automatic parsing followed by manual revision 45 , which 


can be biased due to either the parser not supporting crossings, or just 
having low recall for crossing dependencies, a common limitation even in 
modern non-projective parsers [7 46 . 


• The need of avoiding crossings to facilitate parsing by computers, as tree- 
banks and annotation guidelines are often developed with this goal in 
mind |4^|^ . 
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• Cognitive considerations: dependency structures with fewer crossings be¬ 
ing easier to understand by humans |49[|50| . 

• Aesthetical considerations: dependency structures with crossings being 
considered nicer than structures with crossings (see |51| and references 
therein). These preferences are supported by the cognitive considerations 
above. 


Second, we should also reflect on statistical factors: 


• The limited capacity of D to predict crossings discussed above (Section 

§• 

• Insufficient sampling: the number of sentences {S) and the number of 
different sentence lengths (M) is significantly small in the subset of the 
treebanks where p(r > 0) is not significantly high (Table . 


• A low mean sentence length. The point is that the chances for crossings 
are a priori lower in smaller sentences for various reasons. On the one 
hand, the size of the set of edges that may potentially cross grows with 


sentence length in general (Eq. 11 in the Appendix). On the other hand, 
the combination of three facts, i.e. 


The well-known tendency of D to decrease as sentence length de- 

(2^[52]|54 


creases 


— True values of D are below chance 52][M 


— The reduction of the probability that two dependencies cross by 
chance as they shorten (provided that they are sufficiently short) 

(2^[M 


suggests that the abundance of C = 0 in some treebanks could be a side 
effect of the principle of dependency length minimization |32| , rather than 
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an external imposition. This possibility is supported by the significantly 
low mean sentence length that is found in the subset of treebanks where 
p{t > 0) is not significantly high (Table |^. However, this issue should 
be the subject of future research because dependency length minimization 
could be beaten by other word order principles at short sentence lengths 



Halfway between annotation and statistical factors we find the decision of some 


treebanks’ annotators to break complex sentences into simple clauses 47 . This 
procedure removes long distance dependencies, reduces mean sentence length 
and for the reasons reviewed above, could reduce the chance of crossings. By 
having examined a series of statistical caveats, we do not mean that they are the 
ultimate reason for the failure to reject the null hypothesis in some languages. 
Those factors, e.g., mean sentence length, could be influenced by aspects such 
as modality (oral versus written) or the genre of the sources used for the 
treebanks [^. However, controlling for these aspects is beyond the scope of 
this article. For these reasons, it is convenient to be conservative and interpret 
the failure to reject the null model as a treebank-specific result that cannot 
be ascribed to a general property of the involved languages or an absence of 
dependency length minimization in them. 

Given all the preceding considerations, our results and previous work |25p6 


provide support for the hypothesis that dependency crossings are a side effect of 
dependency lengths. By not requiring a belief in an autonomous ban of crossings 
[l7 18,20 21 , this hypothesis promises to help develop a more parsimonious 
theory of syntax. 
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Appendix 


The traditional view could be recast as a simple model that predicts, given a 
sentence, a zero number of crossings. This deterministic model with no parame¬ 
ter could be generalized as a stochastic model with one parameter a that defines 
the expected number of crossings. Suppose that E[C\sentence\ is the expecta¬ 
tion of C over all possible orderings of a sentence [^. Then the traditional 
view could be defined as 

E[C\sentence\ = a, (7) 

where a is a constant such that a > 0 . o = 0 implies a ban of crossings because 
C > 0. The parameter a allows one to model crossings in languages with 
varying frequencies of crossings (from languages where there are no crossings to 
languages where crossings occur with a certain frequency). 

If the relevant information of a sentence is D, the sum of dependency lengths 
(see Fig. [^for examples of D), the alternative hypothesis could be modeled 
simply as 

E[C\D]=giD), ( 8 ) 

where 5 is a function of D, and then the traditional hypothesis could be written 
as 

E[C\D] = a. (9) 

A limitation of E[C\D] is that it is defined over a set of possible linearizations 
that includes some that are very unlikely, cognitively harder or “ungrammati¬ 
cal”. In this article, we focus on real linearizations and therefore we consider 
Etb[C\D], the expectation of C given D over the ensemble of linearizations 
of the sentences of a treebank (TB). Etb[C\D] needs to be refined: the dis¬ 
tribution of D depends on the length of the sentence and then values of D 
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from sentences of different length should not be mixed [^. The same kind of 
problem is also likely to concern C. For this reason, instead of Etb[C\D], we 
choose ETB[C\n, D], i.e. the expectation of C conditioning on sentences of the 
treebank that have length n and their sum of dependency lengths is D. 

Now we will explain why qtb depends on n by means of a key concept of 
crossing theory: Q, namely the set of pairs of edges that may potentially cross 
when their vertices are arranged linearly 37 . By definition, C < \Q\, the 


cardinality of Q. When n > 1, we have that 37 




( 10 ) 


where is the degree’s second moment about zero. Knowing that < 


value of of a linear tree, and that _ 4 _ 0 ^^ (when 


linear 


n> 2) [^, we finally obtain 


71 

C<\Q\<-{n-5) + i (11) 

for n > 2. For instance, this implies that UTBin) = 0 for n < 4 (since (7 = 0 
in this case [^) and that 0 < 075 ( 4 ) <1. It is clear that one cannot set 
075 ( 71 ) to a number greater than 2 when ti < 4 because it cannot be reached 
by ETB[C\n,D]. In general (n > 2), aTB{n) > f (n — 5) + 3 is impossible to 
achieve. This is why 075 depends on n o priori. 
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